Percolation in protein sequence space

نویسندگان

  • Patrick C F Buchholz
  • Silvia Fademrecht
  • Jürgen Pleiss
چکیده

The currently known protein sequences are not distributed equally in sequence space, but cluster into families. Analyzing the cluster size distribution gives a glimpse of the large and unknown extant protein sequence space, which has been explored during evolution. For six protein superfamilies with different fold and function, the cluster size distributions followed a power law with slopes between 2.4 and 3.3, which represent upper limits to the cluster distribution of extant sequences. The power law distribution of cluster sizes is in accordance with percolation theory and strongly supports connectedness of extant sequence space. Percolation of extant sequence space has three major consequences: (1) It transforms our view of sequence space as a highly connected network where each sequence has multiple neighbors, and each pair of sequences is connected by many different paths. A high degree of connectedness is a necessary condition of efficient evolution, because it overcomes the possible blockage by sign epistasis and reciprocal sign epistasis. (2) The Fisher exponent is an indicator of connectedness and saturation of sequence space of each protein superfamily. (3) All clusters are expected to be connected by extant sequences that become apparent as a higher portion of extant sequence space becomes known. Being linked to biochemically distinct homologous families, bridging sequences are promising enzyme candidates for applications in biotechnology because they are expected to have substrate ambiguity or catalytic promiscuity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A model for modified electrode with carbon nanotube composites using percolation theory in fractal space

We introduce a model for prediction the behavior of electrodes which modified withcarbon nanotubes in a polymer medium. These kinds of polymer composites aredeveloped in recent years, and experimental data for its percolation threshold isavailable. We construct a model based on percolation theory and fractal dimensionsand using experimental percolation threshold for calculating the moments of c...

متن کامل

Expression and Secretion of Human Granulocyte Macrophage-Colony Stimulating Factor Using Escherichia coli Enterotoxin I Signal Sequence

With the aim of the secretion of human granulocyte macrophage-colony stimulating factor (hGM-CSF) in Escherichia coli, hGM-CSF cDNA was fused in-frame next to the signal sequence of ST toxin (ST-I) of exteroxigenic E. coli, containing 53 or 19 amino acids of signal peptide. The fused STsig::hGM-CSF coding fragments were inserted into a T7-based expression plasmid. The recombinant plasmids were ...

متن کامل

طراحی و ساخت کلون بیان کننده داروی ضد انعقادی دسیرودین (هیرودین) به شکل خارج سلولی در اشرشیا کلی

Background and purpose: Hirudin is a 65-66 amino acids polypeptide which is secreted as an anticoagulant compound from salivary glands of medical leech. This drug is a very potent inhibitor of thrombin and is so effective for arterial and venous thrombosis prevention. Therefore, it can compete with heparin. The aim of this study was to add a pelB signal peptide to pET-22b plasmid and to investi...

متن کامل

Modeling the percolation of annotation errors in a database of protein sequences

Public sequence databases contain information on the sequence, structure and function of proteins. Genome sequencing projects have led to a rapid increase in protein sequence information, but reliable, experimentally verified, information on protein function lags a long way behind. To address this deficit, functional annotation in protein databases is often inferred by sequence similarity to ho...

متن کامل

A new sequence space and norm of certain matrix operators on this space

In the present paper, we introduce the sequence space [{l_p}(E,Delta) = left{ x = (x_n)_{n = 1}^infty : sum_{n = 1}^infty left|  sum_{j in {E_n}} x_j - sum_{j in E_{n + 1}} x_jright| ^p < infty right},] where $E=(E_n)$ is a partition of finite subsets of the positive integers and $pge 1$. We investigate its topological properties and inclusion relations. Moreover, we consider the problem of fin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 12  شماره 

صفحات  -

تاریخ انتشار 2017